Whole-genome sequencing analysis reveals new susceptibility loci and structural variants associated with progressive supranuclear palsy

Background Progressive supranuclear palsy (PSP) is a rare neurodegenerative disease characterized by the accumulation of aggregated tau proteins in astrocytes, neurons, and oligodendrocytes. Previous genome-wide association studies for PSP were based on genotype array, therefore, were inadequate for the analysis of rare variants as well as larger mutations, such as small insertions/deletions (indels) and structural variants (SVs). Method In this study, we performed whole genome sequencing (WGS) and conducted association analysis for single nucleotide variants (SNVs), indels, and SVs, in a cohort of 1,718 cases and 2,944 controls of European ancestry. Of the 1,718 PSP individuals, 1,441 were autopsy-confirmed and 277 were clinically diagnosed. Results Our analysis of common SNVs and indels confirmed known genetic loci at MAPT, MOBP, STX6, SLCO1A2, DUSP10, and SP1, and further uncovered novel signals in APOE, FCHO1/MAP1S, KIF13A, TRIM24, TNXB, and ELOVL1. Notably, in contrast to Alzheimer’s disease (AD), we observed the APOE ε2 allele to be the risk allele in PSP. Analysis of rare SNVs and indels identified significant association in ZNF592 and further gene network analysis identified a module of neuronal genes dysregulated in PSP. Moreover, seven common SVs associated with PSP were observed in the H1/H2 haplotype region (17q21.31) and other loci, including IGH, PCMT1, CYP2A13, and SMCP. In the H1/H2 haplotype region, there is a burden of rare deletions and duplications (P = 6.73 × 10–3) in PSP. Conclusions Through WGS, we significantly enhanced our understanding of the genetic basis of PSP, providing new targets for exploring disease mechanisms and therapeutic interventions. Supplementary Information The online version contains supplementary material available at 10.1186/s13024-024-00747-3.


Background
Progressive supranuclear palsy (PSP) is a neurodegenerative disease that is pathologically defined by the accumulation of aggregated tau protein in multiple cortical and subcortical regions, especially involving the basal ganglia, dentate nucleus of the cerebellum midbrain [1].An isoform of tau harboring 4 repeats of microtubule-binding domain (4R-tau) is particularly prominent in these tau aggregates [2].Clinical manifestations of PSP include a range of phenotypes, including the initially described and most common, PSP-Richardson syndrome that presents with multiple features, including postural instability, vertical supranuclear palsy, and frontal dementia.However, there are several other phenotypes, such as PSP-Parkinsonism, PSP-Frontotemporal dementia, PSPfreezing of gait, PSP-speech and language disturbances, etc. [3].Presentation of these phenotypes varies widely depending on the distribution and severity of the pathology [4][5][6].
Currently, the most recognized genetic risk locus for PSP is at the H1/H2 haplotype region covering MAPT gene at chromosome 17q21.31[7], where individuals carrying the common H1 haplotype are more likely to develop PSP with an estimated odds ratio (OR) of 5.6 [8].Previous studies usually ascribed the observed association in the H1/H2 haplotype to MAPT [7,9,10].However, recent functional dissection of this region using multiple parallel reporter assays coupled to CRISPRi demonstrated multiple risk genes in the area in addition to MAPT, including KANSL1 and PLEKMHL1 [11].Genome-wide association studies (GWASs) in PSP have identified common variants in STX6, EIF2AK3, MOBP, SLCO1A2, DUSP10, RUNX2, and LRRK2 with moderate effect size [8,[12][13][14].In addition, variants in TRIM11 were identified as a genetic modifier of the PSP phenotype when comparing PSP with Richardson syndrome to PSP without Richardson syndrome [15].
To date, no comprehensive analysis of single nucleotide variants (SNVs), small insertions and deletions (indels), and structural variants (SVs) in PSP by whole genome sequencing has been conducted.To gain a more comprehensive understanding of the genetic underpinnings of PSP, we performed whole genome sequencing (WGS) and analyzed SNVs, indels, and SVs.As a result, we validated previously reported genes and unveiled new loci that provide novel insights into the genetic basis of PSP.

Study subjects
We performed WGS at 30 × coverage for 1,834 PSP cases and 128 controls from the PSP-NIH-CurePSP-Tau, PSP-CurePSP-Tau, PSP-UCLA, and AMPAD-MAYO cohorts included in Alzheimer's Disease Sequencing Project (ADSP, NG00067.v7) and used 3,008 controls from other cohorts in ADSP (Table S1) [16].Control subjects were self-identified as non-Hispanic white.WGS data is available on The National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (NIAGADS) [17].Among 1,834 individuals with PSP, 1,488 were autopsyconfirmed and 346 were clinical diagnosed.34 of the clinically diagnosed PSP had subsequent autopsy, of which 29 had confirmed PSP and five did not have PSP on autopsy.These five subjects without PSP on autopsy were removed.We also removed related subjects (identify by descent > 0.25) and non-Europeans (subjects that were eight standard deviations away from the 1000 Genomes Project European samples [18,19] using the first six principal components (PCs)), resulting in 1,718 individuals with PSP and 2,944 control subjects.Of the 1,718 PSP individuals, 1,441 were autopsy-confirmed and 277 were clinically diagnosed (Table 1).Among 1,718 PSP cases, 740 samples were included in previous GWASs (386 samples in Höglinger et al. stage 1 analysis [8], 107 samples in Höglinger et al. stage 2 analysis [8], and 247 samples in Chen et al. [13]) Among 2,944 controls, 113 controls from PSP-UCLA cohort (Table S2) were included in Chen et al. [13].
Considering that our sample set incorporated external controls from ADSP, initially collected for Alzheimer's Disease (AD) studies, there was a potential selection biases for APOE ε4 and ε2 in controls.To rigorously validate our findings linked to APOE, we broke down the allele frequencies of APOE ε4 and ε2 by cohorts (Table S2), reviewed the study design of each cohort, and created an additional sample set by excluding those cohorts with selection bias against APOE ε4 or ε2 (Supplementary Methods).

SNVs/indels quality controls
Only biallelic variants were included in common (Minor Allele Frequency [MAF] > 0.01) SNVs/indels analysis.A biallelic site is a locus in the genome that contains two observed alleles, i.e., one reference allele and one alternative allele.Variants were removed if they were monomorphic, did not pass variant quality score recalibration (VQSR), had an average read depth ≥ 500, or if all calls have alignment depth (DP < 10) and genotype quality (GQ < 20).Individual calls with DP < 10 or GQ < 20 were set to missing.Indels were left aligned using the GRCh38 reference [20,21].Common variants with a missing rate < 0.1, 0.25 < allele balance for heterozygous calls (ABHet) < 0.75, and Hardy-Weiberg Equilibrium tests (HWE) in controls > 1 × 10 -5 were kept for analysis, leaving 7,945,112 SNVs/indels for analysis.Similar quality control procedures were applied to rare variants (Supplementary Methods).Then, we calculated the heritability of PSP using GCTA-LDMS [22] for common SNVs/indels (MAF > 0.01) and common plus rare SNVs/ indels.A prevalence of 5 PSP cases per 100,000 individuals (0.00005) was used in the GCTA-LDMS analysis.

Common SNVs/indels analysis
For association analysis, linear mixed model implemented in R Genesis [23] were used.Genetic relatedness matrix was obtained using KING [24].PCs were obtained by PC-AiR [25] which accounts for sample relatedness.Sex and PC1-5 were adjusted in the linear mixed model.Age was not adjusted as more than half (1,159 of 1,718) of PSP cases had age missing.SNVs and indels with a P < 1 × 10 -6 were reported along with the WGS quality metrics, such as QualByDepth (QD) and FisherStrand (FS), (Table S3).
For H1/H2 region, fine-mapping were analyzed using SuSie [26].We ran the analysis several times assuming the number of maximum causal variants were from 2 to 10.The only variant (rs242561) robust to the choice of maximum causal variants was reported.For major histocompatibility complex (MHC) region on chromosome 6, we imputed HLA alleles for HLA-A, HLA-B, HLA-C, HLA-DQB1 and HLA-DRB1 using CookHLA [27].HLA alleles in linkage disequilibrium (LD) (R 2 > 0.1) with the most significant SNV (rs367364) in this region were reported (Table S9).Then, we used linear regression models and performed association analysis for each HLA allele, adjusting for sex and PC1-5 (Table S10).To avoid potential confounding effects (particularly for APOE alleles), we also performed association analysis (Table S4, Table S5) for SNVs/indels with a P < 1 × 10 -6 when excluding subjects from the three cohorts with selection bias against APOE alleles (ADSP-FUS1-APOEextremes, ADSP-FUS1-StEPAD1, and CacheCounty) along with cohorts with less than 10 subjects (NACC-Genentech, FASe-Families-WGS, and KnightADRC-WGS) (Table S2).We also performed additional experimental validation using TaqMan assay/Sanger sequencing to confirm the genotype of APOE observed from WGS (Supplementary Methods, Table S6).

Rare SNVs/indels analysis
For aggregated tests of rare variants, we considered rare protein truncating variants (PTVs) and PTVs/damaging missense variants.Variant were annotated with ANNOVAR (version 2020-06-07) [28] and Variant Effect Predictor (VEP, version 104.3) [29].PTVs were in protein coding genes (Ensembl version 104) [30] and had VEP consequence as stop gained, splice acceptor, splice donor or frameshift.Damaging missense variants were in protein coding genes (Ensembl version 104) and had a VEP consequence as missense, CADD score ≥ 15, and PolyPhen-2 HDIV of probably damaging.Rare variants were selected based on a MAF < 0.01% from gnomAD and a MAF < 1% in our dataset.The number of alternative allele variants in PTVs and PTVs/damaging missense variants was similar across sequencing centers and when evaluated for loss of function intolerant genes (observed/ expected score upper confidence interval < 0.35 [31]) (Fig. S14).We performed SKAT-O and gene burden testing (SKATBinary, method = 'burden') for PTVs and PTV/ damaging missense variants (Supplementary Methods).We also considered only PTVs or PTVs/missense variants in loss of function intolerant genes (observed/ expected score upper confidence interval < 0.35 [31]) when performing the tests.P-values were FDR corrected for the number of genes with a total minor allele count (MAC) ≥ 10.As SKAT-O does not calculate an odds ratio, we calculated the odds ratio of significant genes using logistic regression with the same covariates as SKAT-O and burden testing, and the same variant weights.
We evaluated the C1 module, a gene set, which was previously shown to be composed of neuronal genes and enriched for common variants in PSP [32].We performed a permutation test (N = 1000) of random gene set modules from brain expressed genes that contained the same number of genes as C1.From the human protein atlas (www.prote inatl as.org) [33], brain expressed genes were defined as the union of unique proteins from the cerebral cortex, basal ganglia and midbrain (N = 15,638).We calculated SKAT-O P-values from these random gene modules to determine the null distribution.We calculated the unadjusted odds ratio of significant genes or gene sets by summing the number of alternate alleles in the gene set among the total number alleles in cases and controls.Normalized quantification (TPM) gene expression across tissues was obtained from Genotype-Tissue Expression (GTEx) [34].The expression of ZNF592 and C1 module (summarized as an eigengene [35]) were plotted.

SV detection and filtering
For each sample, SVs were called by Manta (v1.6.0)[36] and Smoove (v0.2.5) [37] with default parameters.Calls from Manta and Smoove were merged by Svimmer [38] to generate a union of two call sets for a sample.Then, all individual sample VCF files were merged together by Svimmer as input to Graphtyper2 (v2.7.3) [38] for joint genotyping.SV calls after joint-genotyping are comparable across the samples, therefore, can be used directly in genome-wide association analysis [38].A subset of SV calls was defined as high-quality calls [38].Details of SV calling pipeline were in our previous study [39].For each individual SV reported, Samplot [40] or IGV [41] were used to keep only high-confident CNVs and inversions that are supported by read depth or split reads; for insertions, we kept high-confident insertions that are highquality and not in the masked regions (Supplementary Methods).

SV analysis
For SV association, more strict sample filtering was applied: outlier samples with too many (larger than median + 4*MAD) CNV/insertion calls or too little (smaller than median-4*MAD) high-quality CNV/ insertion calls were removed.There were 4,432 samples (1,703 cases and 2,729 controls) remaining for PSP SV association analysis.Due to more false positives being picked up, the genomic inflation would be high (λ = 1.89,Fig. S9) if all SVs were included in the analysis.Therefore, we restricted our analysis to high-quality SVs only, making the genomic inflation drop to 1.27 (Fig. S9).The 14,792 high-quality common SVs (MAF > 0.1) with call rate > 0.5 were included in the analysis.Mixed model implemented in R Genesis were used for association.Sex, PCR information, SV PCs 1-5, and SNV PCs 1-5 were adjusted in the mixed model.After association, we manually inspect deletions, duplications, and inversions by Samplot or IGV to keep only those with support from read depth, split read or insert size.For insertions, those not on masked regions were reported.
For SVs inside the H1/H2 region, all SVs those that are not high-quality are included.Then, we removed SVs with missing rate > 0.5 and manual inspect deletions, duplications, and inversions by Samplot or IGV to keep only those with support from read depth, split read or insert size.For insertions, those high-quality ones not on masked regions were kept for analysis.LD between SVs was calculated using PLINK (V1.90 beta) [42].Rare SV burden on H1/H2 region was evaluated by SKAT-O [43] adjusting for gender and PCs 1-5.As SKAT-O does not calculate an odds ratio, we calculated the odds ratio using logistic regression with the same covariates.

APOE and risk of PSP
One newly identified significant locus from our analysis is the well-known AD risk gene, APOE.We observed a significant association between the APOE ε2 haplotype and an elevated risk of PSP (P = 9.57 × 10 -16 , β = 0.87, MAF = 0.06, Table 2, Fig. S3B).The APOE ε2 haplotype is encoded by rs429358-T and rs7412-T, which is considered a protective allele in AD.The increased risk of APOE ε2 in PSP has been previously reported in a Japanese cohort, albeit with a relatively small sample size [46].Furthermore, Zhao et al. [47] confirmed that APOE ε2 is linked to increased tau pathology in the brains of individuals with PSP and reported a higher frequency of homozygosity of APOE ε2 in PSP with an odds ratio of 4.41.Consistent with these findings, our dataset exhibited a higher frequency of homozygosity of rs7412-T in PSP, yielding an odds ratio of 3.91.
Given that our dataset included external controls from ADSP collected for Alzheimer's disease studies, there were potential selection biases for APOE ε4 and ε2 in controls.To address this concern, we analyzed the allele frequencies of APOE ε4 and ε2 by cohorts (Table S2) and indicated cohorts with potential selection bias.The association analysis excluding these cohorts shows the ε2 SNV (rs7412, P = 1.23 × 10 -12 , β = 0.70, MAF = 0.06) remained genome-wide significant and ε4 SNV (rs429358, P = 0.02, β = -0.16,MAF = 0.14) was nominally significant (Table S4, Table S5).Despite removing ADSP controls with a potential selection bias for APOE ε4 and ε2, the allele frequency of APOE2 is still higher in external databases (AF = 0.0752-0.1060;Table 3) compared to controls in our study (AF = 0.0454; Table S4).This indicates there could still be additional factors affecting the collection of controls in ADSP.

Rare SNVs/indels and network analysis
The heritability of PSP for common SNVs and indels (MAF > 0.01) was estimated to be 20%, while common plus rare SNVs/indels was estimated to be 23% from our analysis using GCTA-LDMS [22].Therefore, we performed aggregated tests for rare SNVs and indels, and identified ZNF592 (SKAT-O FDR = 0.043, burden test 0.0406 FDR = 0.041) with an of OR = 1.08 (95% CI: 1.008-1.16)(Fig. 2, Table 4, Table S7) for protein truncating or damaging missense variants.There was no genomic inflation with a λ = 1.07 (Fig. 2).Risk in ZNF592 was imparted by 16 unique variants, with one splice donor and 15 damaging missense variants (Table S7).ZNF592 has not been previously associated with PSP but showed moderate RNA expression in the cerebellum compared to other tissues from GTEx (Fig. S8).There were no significant genes identified when evaluating PTVs only or when restricting to loss of function intolerant genes.
Considering that genes do not operate alone, but rather within signaling pathways and networks, we and others have shown that better understanding of disease mechanisms can be achieved through gene network analysis [58][59][60].Therefore, we scrutinized rare variants within a network framework, focusing on co-expression network analysis performed in PSP post mortem brain that had previously identified a brain co-expression module, C1, which was conserved at the protein interaction level and enriched for common variants in PSP [32].We found this C1 neuronal module was significantly enriched with PSP rare variants (P = 0.006, OR [95% CI] = 1.31 [1.01-1.70],Table 4; Table S8).Genes from the C1 module were more likely to be loss of function intolerant compared to the background of all brain expressed genes (Fig. S8).To ensure that this was association not spurious, we performed permutation testing using random gene modules of brain expressed genes with the same number of genes as C1.The C1 module remains significant (Permutation P = 0.078).Exploring GTEx, we found that C1 genes are highly expressed in brain tissues including the cerebellum, frontal cortex, and basal ganglia (Fig. S8), consistent with regions affected in this disorder.

SVs in H1/H2 haplotype region
The H1/H2 region stands out as the pivotal genetic risk factor for PSP [8,67].The H2 haplotype exhibits a reduced odds ratio of 0.19, as we observed the allele frequency of the 238 bp H2-tagging deletion is 23% in PSP and only 5% in control (P < 2.2 × 10 -16 ).Moreover, our analysis pointed out five common (MAF > 0.01) and 12 rare deletions and duplications in the region (Table 6), ranging from 88 bp to 47 kb.Additionally, one common and four rare high-confidence insertions were reported in the region.Of the five common deletions and duplications (Fig. S12), three show genome-wide significant association with the disease (Table 5); four are located in regions with transposable elements (SVA, L1, or Alu) and in LD (r 2 from 0.63 to 0.92) with the 238 bp H2-tagging deletion.This further highlights the important role of transposable elements in shaping the landscape of H1/H2 region.
Among the 12 rare deletions and duplications (Fig. S13), five are located in potentially functional regions, such as splice sites, exons, and transcription factor binding sites (Table 6).Particularly, one deletion (chr17:45993882-45993970) in exon 9 of MAPT was identified in a PSP patient, adding to previous reports of exonic deletions in the MAPT in frontotemporal dementia, such as deletion of exon 10 [68] and exons 6-9 [69] in MAPT.Using the SKAT-O test (N = 4,432), the 12 rare CNVs displayed a significantly higher burden in PSP than controls (P = 0.01, OR = 1.64).
The APOE loci was of particular interest as it is a common risk factor for AD, explaining more than a 1/3 of population attributable risk [70,71].In contrast to AD, the ε4 tagging allele rs429358 was protective in PSP and the ε2 tagging allele rs7412 was deleterious.After removing ADSP controls with a potential selection bias for APOE ε4 and ε2, ε2 remained genome-wide significant and ε4 showed nominal significance.This observation is particularly intriguing since both AD and PSP have intracellular aggregated tau as a prominent neuropathologic feature.Notably, both ε2 allele and ε4 allele have been associated with tau pathology burden in the brain of mice models [47,72], which raises the question of distinct tau species in 4R-PSP versus 3R-4R-AD.It is also notable that the ε2 allele is also associated with increased risk for age-related macular degeneration (AMD), and the ε4 allele was associated with decreased risk [73,74].These results demonstrate that the same variant may have opposite effects in different degenerative diseases.This is especially important, given the advent of gene editing as a therapeutic modality, and programs focused on changing APOE ε4 to ε2.Although this therapy would likely decrease risk for AD, our results indicate that it could increase risk for PSP, in addition to AMD.From this standpoint, caution is warranted in germ-line genome editing until the broad spectrum of phenotypes associated with human genetic variation is understood.
Burden association tests are an highly valuable for addressing sample size limitations in analyzing rare variants [78].Indeed, burden testing allowed us to identify ZNF592, a classical C2H2 zinc finger protein (ZNF) [79,80], as a candidate risk gene.ZNF proteins have been causative or strongly associated with large numbers of neurodevelopmental disease [81,82] and neurodegenerative disease including Parkinson's disease [83] and Alzheimer's disease [84,85].ZNF592 was initially thought to be responsible for autosomal recessive spinocerebellar ataxia 5 from a consanguineous family with neurodevelopmental delay including cerebellar ataxia and intellectual disability due to a homozygous G1046R substitution [86].However, further analysis of this family identified WDR73 to be the most likely causative gene, consistent with Galloway-Mowat syndrome, although ZNF592 may have contributed to the phenotype [87].
We also extended classical gene-based burden analysis to consider rare risk burden in the context of a gene set defined by co-expression networks [32,88].We leveraged combined previous proteomic and transcriptomic analysis of post-mortem brain from patients afflicted with PSP, and showed that rare variants enrich in the C1 neuronal module, which was the same module enriched with common variants [32].This, along with our recent work identifying a neuronally-enriched transcription factor network centered around SP1 disrupted by PSP common genetic risk, suggests that although PSP neuropathologically is defined by tufted astrocytes and oligodendroglial coiled bodies [6,89,90], initial causal drivers of PSP appear to be primarily neuronal.
In analysis of SVs, we found deletions in PCMT1 and IGH were significantly associated with PSP.The IGH deletions are in a complex region on chromosome 14 that encodes immunoglobins recognizing foreign antigens.The size of the IGH deletion varies across individuals (Fig. S9).In addition, the IGH deletions can be accompanied by other deletions, duplications, and inversions (Fig. S9).These combined make the experimental validation of the deletion challenging.The PCMT1 deletion is common (AF = 0.55) with an odds ratio of 8.38 for PSP in homozygous individuals.
There were limitations to this study.First, not all PSP were pathologically confirmed (of the 1,718 PSP individuals, 1,441 were autopsy-confirmed and 277 were clinically-diagnosed).The specificity of the National Institute of Neurological Disorders and Stroke and Society for PSP (NINDS-SPSP) from 1997 [91] was shown to be 95% to 100% for probable PSP and around 80% to 93% for possible PSP [92][93][94].The 2017 Movement Disorder Society PSP (MDS-PSP) clinical criteria were developed to improve the sensitivity for PSP patients with variant syndrome that were not reflective of PSP-Richardson Syndrome [3].The MDS criteria also have shown a small decrease in specificity but improved sensitivity in clinicopathological studies [95,96].Additionally, the majority of control samples in this study were from ADSP and were initially collected as controls for AD studies.Although samples with a potential selection bias for APOE ε4 and ε2 were removed, the allele frequency of APOE ε2 in controls was still lower compared to external databases (Tables 3 and S4), indicating that there could be additional factors affecting the collection of controls in ADSP.For example, if individuals had an AD family history, they might be more willing to volunteer to serve as controls in ADSP therefore contributing to the lower allele frequency of APOE2.To clarify this, future replication studies using independent datasets are needed to validate the effects of APOE ε4 and ε2 in PSP.
This work represents an important first step; future work is necessary to further delineate the rare genetic risk in PSP harbored in coding and noncoding regions.These results may come to fruition as additional genomic analytical methods are developed, sample size increased, and orthogonal genomic data are integrated.While PSP is rare, it is the most common primary tauopathy, and studying this disease is critical to understanding common pathological mechanisms across tauopathies.Further work to include individuals with diverse ancestry background will also improve our understanding of genetic architecture of the disease.

Conclusion
In conclusion, this study significantly advances our understanding of the genetic basis of PSP through WGS from this study.Previous GWAS signals were validated, and APOE2 was found to the risk allele for PSP from the analysis of common SNVs and indels.Additionally, the analysis of rare SNVs/indels and SVs has revealed additional genetic targets, including ZNF592, IGH, PCMT1, CYP2A13, and SMCP, opening new avenues for investigating disease mechanisms and potential therapeutic interventions.This project is supported by CurePSP, courtesy of a donation from the Morton and Marcine Friedman Foundation.We are indebted to the Biobanc-Hospital Clinic-FRCB-IDIBAPS and Center for Neurodegenerative Disease Research at Penn for samples and data procurement.

Fig. 2
Fig. 2 Association analysis of rare SNVs/indels.A Manhattan plot for genes with protein truncating variants or damaging missense variants.B Q-Q plot of gene P-values with protein truncating variants or damaging missense variants The PSP genetics study group is a multisite collaboration including: German Center for Neurodegenerative Diseases (DZNE), Munich; Department of Neurology, LMU Hospital, Ludwig-Maximilians-Universität (LMU), Munich, Germany (Franziska Hopfner, Günter Höglinger); German Center for Neurodegenerative Diseases (DZNE), Munich; Center for Neuropathology and Prion Research, LMU Hospital, Ludwig-Maximilians-Universität (LMU), Munich, Germany (Sigrun Roeber, Jochen Herms); Justus-Liebig-Universität Gießen, Germany (Ulrich Müller); MRC Centre for Neurodegeneration Research, King's College London, London, UK (Claire Troakes); Movement Disorders Unit, Neurology Department and Neurological Tissue Bank and Neurology Department, Hospital Clínic de Barcelona, University of Barcelona, Barcelona, Catalonia, Spain (Ellen Gelpi; Yaroslau Compta); Department of Neurology and Netherlands Brain Bank, Erasmus Medical Centre, Rotterdam, The Netherlands (John C. van Swieten); Division of Neurology, Royal University Hospital, University of Saskatchewan, Canada (Alex Rajput); Australian Brain Bank Network in collaboration with the Victorian Brain Bank Network, Australia (Fairlie Hinton), Department of Neurology, Hospital Ramón y Cajal, Madrid, Spain (Justo García de Yebenes).The acknowledgement of PSP cohorts is listed below, whereas the acknowledgement of ADSP cohorts for control samples can be found in the supplementary materials.The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS.The data used for the analyses described in this manuscript were obtained from: https:// gtexp ortal.org/ home/ datas ets the GTEx Portal on 1/27/2022.We also thank to Drs.Murray Grossman and Hans Kretzschmar for their valuable contribution to this work.AMP-AD (sa000011) data: Mayo RNAseq Study-Study data were provided by the following sources: The Mayo Clinic Alzheimer's Disease Genetic Studies, led by Dr. Nilufer Ertekin-Taner and Dr. Steven G. Younkin, Mayo Clinic, Jacksonville, FL using samples from the Mayo Clinic Study of Aging, the Mayo Clinic Alzheimer's Disease Research Center, and the Mayo Clinic Brain Bank.Data collection was supported through funding by NIA grants P50 AG016574, R01 AG032990, U01 AG046139, R01 AG018023, U01 AG006576, U01 AG006786, R01 AG025711, R01 AG017216, R01 AG003949, NINDS grant R01 NS080820, CurePSP Foundation, and support from Mayo Foundation.Study data includes samples collected through the Sun Health Research Institute Brain and Body Donation Program of Sun City, Arizona.The Brain and Body Donation Program is supported by the National Institute of Neurological Disorders and Stroke (U24 NS072026 National Brain and Tissue Resource for Parkinson's Disease and Related Disorders), the National Institute on Aging (P30 AG19610 Arizona Alzheimer's Disease Core Center), the Arizona Department of Health Services

Table 1
Characteristics of study participants SD Standard deviation, AF Allele frequency a APOE ε4 is represented by the genotypes of rs429358-C b APOE ε2 is represented by the genotypes of rs7412-T c H2 haplotype is determined by the genotypes of rs8070723-G

Table 2
Top associations from genome-wide association studyChr Chromosome, Ref Reference allele, Alt Alternative allele, AF Allele frequency * Represents the SNV regulates multiple genes, and the gene with the smallest P-value was shown here (eQTL/sQTL for the brain region was obtained through GTEx) a SNVs with significant eQTL hits b SNVs with significant sQTL hits c SNVs with both eQTL and sQTL hits

Table 4
Association analysis of ZNF592 and the C1 module

Table 6
High-confident structural variants in the H1/H2 haplotype region AF Allele frequency, N Number of individuals with non-missing genotypes * High-quality SVs that were included in association analysis a Represents SVs with DNA samples available and PCR validated